{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f8ffbea7",
   "metadata": {},
   "source": [
    "# Working with missing data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e3ab093",
   "metadata": {},
   "source": [
    "In this section, we will discuss missing (also referred to as `NA`) values in cudf. cudf supports having missing values in all dtypes. These missing values are represented by `<NA>`. These values are also referenced as \"null values\"."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d657a82",
   "metadata": {},
   "source": [
    "## How to Detect missing values"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ea9f672",
   "metadata": {},
   "source": [
    "To detect missing values, you can use `isna()` and `notna()` functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "58050adb",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cudf\n",
    "import numpy as np\n",
    "\n",
    "rng = np.random.default_rng()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "416d73da",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = cudf.DataFrame({\"a\": [1, 2, None, 4], \"b\": [0.1, None, 2.3, 17.17]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5dfc6bc3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17.17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      a      b\n",
       "0     1    0.1\n",
       "1     2   <NA>\n",
       "2  <NA>    2.3\n",
       "3     4  17.17"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4d7f7a6d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       a      b\n",
       "0  False  False\n",
       "1  False   True\n",
       "2   True  False\n",
       "3  False  False"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "40edca67",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1     True\n",
       "2    False\n",
       "3     True\n",
       "Name: a, dtype: bool"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"a\"].notna()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acdf29d7",
   "metadata": {},
   "source": [
    "One has to be mindful that in Python (and NumPy), the nan's don't compare equal, but None's do. Note that cudf/NumPy uses the fact that `np.nan != np.nan`, and treats `None` like `np.nan`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c269c1f5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "None == None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "99fb083a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.nan == np.nan"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fdb8bc7",
   "metadata": {},
   "source": [
    "So as compared to above, a scalar equality comparison versus a None/np.nan doesn't provide useful information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "630ef6bb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1     <NA>\n",
       "2    False\n",
       "3    False\n",
       "Name: b, dtype: bool"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"b\"] == np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8162e383",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = cudf.Series([None, 1, 2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "199775b3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    <NA>\n",
       "1       1\n",
       "2       2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "cd09d80c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    <NA>\n",
       "1    <NA>\n",
       "2    <NA>\n",
       "dtype: bool"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s == None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "6b23bb0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = cudf.Series([1, 2, np.nan], nan_as_null=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "cafb79ee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    2.0\n",
       "2    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "13363897",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s == np.nan"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "208a3776",
   "metadata": {},
   "source": [
    "## Float dtypes and missing data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c174b88",
   "metadata": {},
   "source": [
    "Because ``NaN`` is a float, a column of integers with even one missing values is cast to floating-point dtype. However this doesn't happen by default.\n",
    "\n",
    "By default if a ``NaN`` value is passed to `Series` constructor, it is treated as `<NA>` value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c59c3c54",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1\n",
       "1       2\n",
       "2    <NA>\n",
       "dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cudf.Series([1, 2, np.nan])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9eb2d9c",
   "metadata": {},
   "source": [
    "Hence to consider a ``NaN`` as ``NaN`` you will have to pass `nan_as_null=False` parameter into `Series` constructor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "ecc5ae92",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    2.0\n",
       "2    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cudf.Series([1, 2, np.nan], nan_as_null=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1db7b08",
   "metadata": {},
   "source": [
    "## Datetimes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "548d3734",
   "metadata": {},
   "source": [
    "For `datetime64` types, cudf doesn't support having `NaT` values. Instead these values which are specific to numpy and pandas are considered as null values(`<NA>`) in cudf. The actual underlying value of `NaT` is `min(int64)` and cudf retains the underlying value when converting a cudf object to pandas object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "de70f244",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2012-01-01 00:00:00.000000\n",
       "1                          <NA>\n",
       "2    2012-01-01 00:00:00.000000\n",
       "dtype: datetime64[us]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "datetime_series = cudf.Series(\n",
    "    [pd.Timestamp(\"20120101\"), pd.NaT, pd.Timestamp(\"20120101\")]\n",
    ")\n",
    "datetime_series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "8411a914",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   2012-01-01\n",
       "1          NaT\n",
       "2   2012-01-01\n",
       "dtype: datetime64[ns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "datetime_series.to_pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df664145",
   "metadata": {},
   "source": [
    "any operations on rows having `<NA>` values in `datetime` column will result in `<NA>` value at the same location in resulting column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "829c32d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0 days 00:00:00\n",
       "1               <NA>\n",
       "2    0 days 00:00:00\n",
       "dtype: timedelta64[us]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "datetime_series - datetime_series"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa8031ef",
   "metadata": {},
   "source": [
    "## Calculations with missing data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c587fae2",
   "metadata": {},
   "source": [
    "Null values propagate naturally through arithmetic operations between pandas objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "f8f2aec7",
   "metadata": {},
   "outputs": [],
   "source": [
    "df1 = cudf.DataFrame(\n",
    "    {\n",
    "        \"a\": [1, None, 2, 3, None],\n",
    "        \"b\": cudf.Series([np.nan, 2, 3.2, 0.1, 1], nan_as_null=False),\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "0c8a3011",
   "metadata": {},
   "outputs": [],
   "source": [
    "df2 = cudf.DataFrame(\n",
    "    {\"a\": [1, 11, 2, 34, 10], \"b\": cudf.Series([0.23, 22, 3.2, None, 1])}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "052f6c2b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      a    b\n",
       "0     1  NaN\n",
       "1  <NA>  2.0\n",
       "2     2  3.2\n",
       "3     3  0.1\n",
       "4  <NA>  1.0"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "0fb0a083",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>11</td>\n",
       "      <td>22.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>34</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a     b\n",
       "0   1  0.23\n",
       "1  11  22.0\n",
       "2   2   3.2\n",
       "3  34  <NA>\n",
       "4  10   1.0"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "6f8152c0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>24.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>6.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      a     b\n",
       "0     2   NaN\n",
       "1  <NA>  24.0\n",
       "2     4   6.4\n",
       "3    37  <NA>\n",
       "4  <NA>   2.0"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 + df2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11170d49",
   "metadata": {},
   "source": [
    "While summing the data along a series, `NA` values will be treated as `0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "45081790",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1\n",
       "1    <NA>\n",
       "2       2\n",
       "3       3\n",
       "4    <NA>\n",
       "Name: a, dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"a\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "39922658",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"a\"].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e99afe0",
   "metadata": {},
   "source": [
    "Since `NA` values are treated as `0`, the mean would result to 2 in this case `(1 + 0 + 2 + 3 + 0)/5 = 2`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "b2f16ddb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.0"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"a\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07f2ec5a",
   "metadata": {},
   "source": [
    "To preserve `NA` values in the above calculations, `sum` & `mean` support `skipna` parameter.\n",
    "By default it's value is\n",
    "set to `True`, we can change it to `False` to preserve `NA` values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "d4a463a0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"a\"].sum(skipna=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "a944c42e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"a\"].mean(skipna=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb8c8f18",
   "metadata": {},
   "source": [
    "Cumulative methods like `cumsum` and `cumprod` ignore `NA` values by default."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "4f2a7306",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1\n",
       "1    <NA>\n",
       "2       3\n",
       "3       6\n",
       "4    <NA>\n",
       "Name: a, dtype: int64"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"a\"].cumsum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8f6054b",
   "metadata": {},
   "source": [
    "To preserve `NA` values in cumulative methods, provide `skipna=False`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "d4c46776",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1\n",
       "1    <NA>\n",
       "2    <NA>\n",
       "3    <NA>\n",
       "4    <NA>\n",
       "Name: a, dtype: int64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"a\"].cumsum(skipna=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67077d65",
   "metadata": {},
   "source": [
    "## Sum/product of Null/nans"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffbb9ca1",
   "metadata": {},
   "source": [
    "The sum of an empty or all-NA Series of a DataFrame is 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "f430c9ce",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cudf.Series([np.nan], nan_as_null=False).sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "7fde514b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cudf.Series([np.nan], nan_as_null=False).sum(skipna=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "56cedd17",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cudf.Series([], dtype=\"float64\").sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb188adb",
   "metadata": {},
   "source": [
    "The product of an empty or all-NA Series of a DataFrame is 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "d20bbbef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cudf.Series([np.nan], nan_as_null=False).prod()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "75abbcfa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cudf.Series([np.nan], nan_as_null=False).prod(skipna=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "becce0cc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cudf.Series([], dtype=\"float64\").prod()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e899e03",
   "metadata": {},
   "source": [
    "## NA values in GroupBy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fb20874",
   "metadata": {},
   "source": [
    "`NA` groups in GroupBy are automatically excluded. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "1379037c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      a    b\n",
       "0     1  NaN\n",
       "1  <NA>  2.0\n",
       "2     2  3.2\n",
       "3     3  0.1\n",
       "4  <NA>  1.0"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "d6b91e6f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     b\n",
       "a     \n",
       "2  3.2\n",
       "1  NaN\n",
       "3  0.1"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.groupby(\"a\").mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb83fb11",
   "metadata": {},
   "source": [
    "It is also possible to include `NA` in groups by passing `dropna=False`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "768c3e50",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>&lt;NA&gt;</th>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        b\n",
       "a        \n",
       "2     3.2\n",
       "1     NaN\n",
       "3     0.1\n",
       "<NA>  1.5"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.groupby(\"a\", dropna=False).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "133816b4",
   "metadata": {},
   "source": [
    "## Inserting missing data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "306082ad",
   "metadata": {},
   "source": [
    "All dtypes support insertion of missing value by assignment. Any specific location in series can made null by assigning it to `None`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "7ddde1fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "series = cudf.Series([1, 2, 3, 4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "16e54597",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "3    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "f628f94d",
   "metadata": {},
   "outputs": [],
   "source": [
    "series[2] = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "b30590b7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1\n",
       "1       2\n",
       "2    <NA>\n",
       "3       4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b123d0",
   "metadata": {},
   "source": [
    "## Filling missing values: fillna"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "114aa23a",
   "metadata": {},
   "source": [
    "`fillna()` can fill in `NA` & `NaN` values with non-NA data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "59e22668",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      a    b\n",
       "0     1  NaN\n",
       "1  <NA>  2.0\n",
       "2     2  3.2\n",
       "3     3  0.1\n",
       "4  <NA>  1.0"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "05c221ee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    10.0\n",
       "1     2.0\n",
       "2     3.2\n",
       "3     0.1\n",
       "4     1.0\n",
       "Name: b, dtype: float64"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"b\"].fillna(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "401f91b2",
   "metadata": {},
   "source": [
    "## Filling with cudf Object"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e79346d6",
   "metadata": {},
   "source": [
    "You can also fillna using a dict or Series that is alignable. The labels of the dict or index of the Series must match the columns of the frame you wish to fill. The use case of this is to fill a DataFrame with the mean of that column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "f52c5d8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cupy as cp\n",
    "\n",
    "cp_rng = cp.random.default_rng()\n",
    "\n",
    "dff = cudf.DataFrame(cp_rng.standard_normal((10, 3)), columns=list(\"ABC\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "6affebe9",
   "metadata": {},
   "outputs": [],
   "source": [
    "dff.iloc[3:5, 0] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "1ce1b96f",
   "metadata": {},
   "outputs": [],
   "source": [
    "dff.iloc[4:6, 1] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "90829195",
   "metadata": {},
   "outputs": [],
   "source": [
    "dff.iloc[5:8, 2] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "c0feac14",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.408268</td>\n",
       "      <td>-0.676643</td>\n",
       "      <td>-1.274743</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.029322</td>\n",
       "      <td>-0.873593</td>\n",
       "      <td>-1.214105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.866371</td>\n",
       "      <td>1.081735</td>\n",
       "      <td>-0.226840</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.812278</td>\n",
       "      <td>1.074973</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-0.366725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-1.016239</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.675123</td>\n",
       "      <td>1.067536</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.221568</td>\n",
       "      <td>2.025961</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>-0.317241</td>\n",
       "      <td>1.011275</td>\n",
       "      <td>0.674891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>-0.877041</td>\n",
       "      <td>-1.919394</td>\n",
       "      <td>-1.029201</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          A         B         C\n",
       "0 -0.408268 -0.676643 -1.274743\n",
       "1 -0.029322 -0.873593 -1.214105\n",
       "2 -0.866371  1.081735 -0.226840\n",
       "3       NaN  0.812278  1.074973\n",
       "4       NaN       NaN -0.366725\n",
       "5 -1.016239       NaN       NaN\n",
       "6  0.675123  1.067536       NaN\n",
       "7  0.221568  2.025961       NaN\n",
       "8 -0.317241  1.011275  0.674891\n",
       "9 -0.877041 -1.919394 -1.029201"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "a07c1260",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.408268</td>\n",
       "      <td>-0.676643</td>\n",
       "      <td>-1.274743</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.029322</td>\n",
       "      <td>-0.873593</td>\n",
       "      <td>-1.214105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.866371</td>\n",
       "      <td>1.081735</td>\n",
       "      <td>-0.226840</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.327224</td>\n",
       "      <td>0.812278</td>\n",
       "      <td>1.074973</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.327224</td>\n",
       "      <td>0.316145</td>\n",
       "      <td>-0.366725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-1.016239</td>\n",
       "      <td>0.316145</td>\n",
       "      <td>-0.337393</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.675123</td>\n",
       "      <td>1.067536</td>\n",
       "      <td>-0.337393</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.221568</td>\n",
       "      <td>2.025961</td>\n",
       "      <td>-0.337393</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>-0.317241</td>\n",
       "      <td>1.011275</td>\n",
       "      <td>0.674891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>-0.877041</td>\n",
       "      <td>-1.919394</td>\n",
       "      <td>-1.029201</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          A         B         C\n",
       "0 -0.408268 -0.676643 -1.274743\n",
       "1 -0.029322 -0.873593 -1.214105\n",
       "2 -0.866371  1.081735 -0.226840\n",
       "3 -0.327224  0.812278  1.074973\n",
       "4 -0.327224  0.316145 -0.366725\n",
       "5 -1.016239  0.316145 -0.337393\n",
       "6  0.675123  1.067536 -0.337393\n",
       "7  0.221568  2.025961 -0.337393\n",
       "8 -0.317241  1.011275  0.674891\n",
       "9 -0.877041 -1.919394 -1.029201"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dff.fillna(dff.mean())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "9e70d61a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.408268</td>\n",
       "      <td>-0.676643</td>\n",
       "      <td>-1.274743</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.029322</td>\n",
       "      <td>-0.873593</td>\n",
       "      <td>-1.214105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.866371</td>\n",
       "      <td>1.081735</td>\n",
       "      <td>-0.226840</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.812278</td>\n",
       "      <td>1.074973</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.316145</td>\n",
       "      <td>-0.366725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-1.016239</td>\n",
       "      <td>0.316145</td>\n",
       "      <td>-0.337393</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.675123</td>\n",
       "      <td>1.067536</td>\n",
       "      <td>-0.337393</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.221568</td>\n",
       "      <td>2.025961</td>\n",
       "      <td>-0.337393</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>-0.317241</td>\n",
       "      <td>1.011275</td>\n",
       "      <td>0.674891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>-0.877041</td>\n",
       "      <td>-1.919394</td>\n",
       "      <td>-1.029201</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          A         B         C\n",
       "0 -0.408268 -0.676643 -1.274743\n",
       "1 -0.029322 -0.873593 -1.214105\n",
       "2 -0.866371  1.081735 -0.226840\n",
       "3       NaN  0.812278  1.074973\n",
       "4       NaN  0.316145 -0.366725\n",
       "5 -1.016239  0.316145 -0.337393\n",
       "6  0.675123  1.067536 -0.337393\n",
       "7  0.221568  2.025961 -0.337393\n",
       "8 -0.317241  1.011275  0.674891\n",
       "9 -0.877041 -1.919394 -1.029201"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dff.fillna(dff.mean()[1:3])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ace728d",
   "metadata": {},
   "source": [
    "## Dropping axis labels with missing data: dropna"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ccd7115",
   "metadata": {},
   "source": [
    "Missing data can be excluded using `dropna()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "98c57be7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      a    b\n",
       "0     1  NaN\n",
       "1  <NA>  2.0\n",
       "2     2  3.2\n",
       "3     3  0.1\n",
       "4  <NA>  1.0"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "bc3f273a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a    b\n",
       "2  2  3.2\n",
       "3  3  0.1"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.dropna(axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "a48d4de0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: [0, 1, 2, 3, 4]"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.dropna(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b1954f9",
   "metadata": {},
   "source": [
    "An equivalent `dropna()` is available for Series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "2dd8f660",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "2    2\n",
       "3    3\n",
       "Name: a, dtype: int64"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[\"a\"].dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "121eb6d7",
   "metadata": {},
   "source": [
    "## Replacing generic values"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cc4c5f1",
   "metadata": {},
   "source": [
    "Often times we want to replace arbitrary values with other values.\n",
    "\n",
    "`replace()` in Series and `replace()` in DataFrame provides an efficient yet flexible way to perform such replacements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "e6c14e8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "series = cudf.Series([0.0, 1.0, 2.0, 3.0, 4.0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "a852f0cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.0\n",
       "1    1.0\n",
       "2    2.0\n",
       "3    3.0\n",
       "4    4.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "f6ac12eb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    5.0\n",
       "1    1.0\n",
       "2    2.0\n",
       "3    3.0\n",
       "4    4.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series.replace(0, 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6e1b6d7",
   "metadata": {},
   "source": [
    "We can also replace any value with a `<NA>` value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "f0156bff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    <NA>\n",
       "1     1.0\n",
       "2     2.0\n",
       "3     3.0\n",
       "4     4.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series.replace(0, None)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6673eefb",
   "metadata": {},
   "source": [
    "You can replace a list of values by a list of other values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "f3110f5b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    4.0\n",
       "1    3.0\n",
       "2    2.0\n",
       "3    1.0\n",
       "4    0.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series.replace([0, 1, 2, 3, 4], [4, 3, 2, 1, 0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61521e8b",
   "metadata": {},
   "source": [
    "You can also specify a mapping dict:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "45862d05",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     10.0\n",
       "1    100.0\n",
       "2      2.0\n",
       "3      3.0\n",
       "4      4.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series.replace({0: 10, 1: 100})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04a34549",
   "metadata": {},
   "source": [
    "For a DataFrame, you can specify individual values by column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "348caa64",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = cudf.DataFrame({\"a\": [0, 1, 2, 3, 4], \"b\": [5, 6, 7, 8, 9]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "cca41ec4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b\n",
       "0  0  5\n",
       "1  1  6\n",
       "2  2  7\n",
       "3  3  8\n",
       "4  4  9"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "64334693",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     a    b\n",
       "0  100  100\n",
       "1    1    6\n",
       "2    2    7\n",
       "3    3    8\n",
       "4    4    9"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.replace({\"a\": 0, \"b\": 5}, 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f0ceec7",
   "metadata": {},
   "source": [
    "## String/regular expression replacement"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6f44740",
   "metadata": {},
   "source": [
    "cudf supports replacing string values using `replace` API:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "031d3533",
   "metadata": {},
   "outputs": [],
   "source": [
    "d = {\"a\": list(range(4)), \"b\": list(\"ab..\"), \"c\": [\"a\", \"b\", None, \"d\"]}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "12b41efb",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = cudf.DataFrame(d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "d450df49",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>a</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>b</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>.</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>.</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b     c\n",
       "0  0  a     a\n",
       "1  1  b     b\n",
       "2  2  .  <NA>\n",
       "3  3  .     d"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "f823bc46",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>a</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>b</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>A Dot</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>A Dot</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a      b     c\n",
       "0  0      a     a\n",
       "1  1      b     b\n",
       "2  2  A Dot  <NA>\n",
       "3  3  A Dot     d"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.replace(\".\", \"A Dot\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "bc52f6e9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>a</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>A Dot</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>A Dot</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a      b     c\n",
       "0  0      a     a\n",
       "1  1   <NA>  <NA>\n",
       "2  2  A Dot  <NA>\n",
       "3  3  A Dot     d"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.replace([\".\", \"b\"], [\"A Dot\", None])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c1087be",
   "metadata": {},
   "source": [
    "Replace a few different values (list -> list):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "7e23eba9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>b</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>b</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>--</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>--</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a   b     c\n",
       "0  0   b     b\n",
       "1  1   b     b\n",
       "2  2  --  <NA>\n",
       "3  3  --     d"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.replace([\"a\", \".\"], [\"b\", \"--\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42845a9c",
   "metadata": {},
   "source": [
    "Only search in column 'b' (dict -> dict):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "d2e79805",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>a</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>b</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>replacement value</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>replacement value</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a                  b     c\n",
       "0  0                  a     a\n",
       "1  1                  b     b\n",
       "2  2  replacement value  <NA>\n",
       "3  3  replacement value     d"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.replace({\"b\": \".\"}, {\"b\": \"replacement value\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "774b42a6",
   "metadata": {},
   "source": [
    "## Numeric replacement"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c1926ac",
   "metadata": {},
   "source": [
    "`replace()` can also be used similar to `fillna()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "355a2f0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = cudf.DataFrame(cp_rng.standard_normal((10, 2)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "d9eed372",
   "metadata": {},
   "outputs": [],
   "source": [
    "df[rng.random(df.shape[0]) > 0.5] = 1.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "ae944244",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.089358787</td>\n",
       "      <td>-0.728419386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-2.141612003</td>\n",
       "      <td>-0.574415182</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.774643462</td>\n",
       "      <td>2.07287721</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.93799853</td>\n",
       "      <td>-1.054129436</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>-0.435293012</td>\n",
       "      <td>1.163009584</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.346623287</td>\n",
       "      <td>0.31961371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              0             1\n",
       "0  -0.089358787  -0.728419386\n",
       "1  -2.141612003  -0.574415182\n",
       "2          <NA>          <NA>\n",
       "3   0.774643462    2.07287721\n",
       "4    0.93799853  -1.054129436\n",
       "5          <NA>          <NA>\n",
       "6  -0.435293012   1.163009584\n",
       "7   1.346623287    0.31961371\n",
       "8          <NA>          <NA>\n",
       "9          <NA>          <NA>"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.replace(1.5, None)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f32607c",
   "metadata": {},
   "source": [
    "Replacing more than one value is possible by passing a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "59b81c60",
   "metadata": {},
   "outputs": [],
   "source": [
    "df00 = df.iloc[0, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "01a71d4c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10.000000</td>\n",
       "      <td>-0.728419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-2.141612</td>\n",
       "      <td>-0.574415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5.000000</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.774643</td>\n",
       "      <td>2.072877</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.937999</td>\n",
       "      <td>-1.054129</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5.000000</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>-0.435293</td>\n",
       "      <td>1.163010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.346623</td>\n",
       "      <td>0.319614</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>5.000000</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>5.000000</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           0         1\n",
       "0  10.000000 -0.728419\n",
       "1  -2.141612 -0.574415\n",
       "2   5.000000  5.000000\n",
       "3   0.774643  2.072877\n",
       "4   0.937999 -1.054129\n",
       "5   5.000000  5.000000\n",
       "6  -0.435293  1.163010\n",
       "7   1.346623  0.319614\n",
       "8   5.000000  5.000000\n",
       "9   5.000000  5.000000"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.replace([1.5, df00], [5, 10])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1080e97b",
   "metadata": {},
   "source": [
    "You can also operate on the DataFrame in place:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "5f0859d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.replace(1.5, None, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "5cf28369",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.089358787</td>\n",
       "      <td>-0.728419386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-2.141612003</td>\n",
       "      <td>-0.574415182</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.774643462</td>\n",
       "      <td>2.07287721</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.93799853</td>\n",
       "      <td>-1.054129436</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>-0.435293012</td>\n",
       "      <td>1.163009584</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.346623287</td>\n",
       "      <td>0.31961371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              0             1\n",
       "0  -0.089358787  -0.728419386\n",
       "1  -2.141612003  -0.574415182\n",
       "2          <NA>          <NA>\n",
       "3   0.774643462    2.07287721\n",
       "4    0.93799853  -1.054129436\n",
       "5          <NA>          <NA>\n",
       "6  -0.435293012   1.163009584\n",
       "7   1.346623287    0.31961371\n",
       "8          <NA>          <NA>\n",
       "9          <NA>          <NA>"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
